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Abstract.  The  approximate  evaluation  with  a  given  precision  of  matrix  and  polynomial  products 
is  performed  using  modular  arithmetic.  The  resulting  algorithms  are  numerically  stable.  At  the 
same  time  they  are  as  fast  as  or  faster  than  the  algorithms  with  arithmetic  operations  over  real  or 
complex  numbers, 
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1.  Introduction. 

This  paper  continues  our  study  (see  [1,2])  of  the  bit-operation  complexity  of  multiplication  of 
two  n  X  n  matrices  ( MM(n ))  and  of  two  n-th  degree  polynomials  ( PM[n ))  (the  latter  problem  is 
also  called  convolution  of  vectors).  We  scale  the  binary  approximations  to  the  inputs  and  outputs  of 
the  problem  to  turn  them  into  integers  and  then  perform  the  computations  over  integers  modulo  K 
where  K  is  appropriately  chosen  so  that  the  desired  approximate  values  of  outputs  are  obtained 
involving  fewer  bit-operations.  The  same  approach  can  be  used  for  DFT  problems  and  for  any 
linear  or  bilinear  (and  even  for  any  division  free)  arithmetic  computational  problems. 

This  idea  is  very  simple  but  the  results  of  our  analysis  seem  meaningful  for  three  reasons. 

The  first  reason  is  that  our  estimates  show  that  practically  all  fast  algorithms  for  MM{n)  can 
be  stabilised  (if  they  are  unstable)  with  no  sacrifice  in  their  rapidity.  We  hope  that  this  our  result 
will  stop  the  discussion  on  the  instability  of  fast  matrix  multiplication.  Such  a  discussion  recently 
has  been  renewed  in  the  literature  (cf.  [3,4])  although  with  no  substantial  progress  comparing,  say, 
with  the  results  of  Ref.  [5].  As  a  matter  of  fact,  some  ways  of  the  stabilisation  of  asymptotically  fast 
algorithms  for  MM[n)  have  already  been  described  in  [6,  sections  1, 3, 16]  but  our  present  approach 
is  more  efficient.  It  successfully  works  even  for  problems  of  MM(n]  where  n  is  moderate  or  small 
although  in  this  paper  we  focus  on  the  asymptotic  estimates  for  the  bit-operation  complexity  of 
algorithms  where  n->  oo. 

Secondly,  our  upper  estimates  for  the  bit-operation  complexity  of  PM(n)  obtained  here  and  in 
[1]  differ  from  the  lower  estimates  by  the  factor  log2  n  rather  than  log  n  (in  the  cases  of  computation 
over  complex  numbers  and  over  integers  modulo  K).  It  would  be  highly  interesting  for  the  theory  of 
computation  to  reduce  this  gap,  say,  to  log n.  This  might  be  an  easier  problem  than  the  reduction 
of  the  gap,  logn,  between  the  lower  and  upper  bounds  on  the  number  of  arithmetic  operations  for 
the  same  problem,  PM(n). 

Thirdly,  we  prove  that,  as  could  be  expected,  in  the  case  of  real  inputs  and  constants  the  tran¬ 
sition  to  computation  modulo  K  enables  us  to  accelerate  the  evaluation- interpolation  algorithms 
for  PM[n)  roughly  in  n  times  (cf.  |l]). 

In  the  next  section  we  outline  our  approach  and  give  preliminary  estimates  for  a  general  class 
of  bilinear  computational  problems.  This  class  includes  the  cases  of  AfM(n)  and  PM(n)  which 
aTe  further  studied  in  the  concluding  Section  3. 

We  will  use  the  following  notation. 

Notation.  T(MM(n)),  T(PM(n)),  tB(p),  *m(p)  are  the  four  minimum  numbers  of  bit-opera¬ 
tions  required  in  order  to  solve  the  problems  MM(n),  PM[n),  to  add/subtract  and  to  multiply 
two  p-bit  binary  integers  respectively.  (Here  and  hereafter  we  assume  that  the  moduli  of  inputs  of 
MM(n)  and  PM(n)  are  bounded  by  a  given  constant  M  >  0  and  that  the  solutions  (outputs)  are 
to  be  found  with  the  errors  less  than  a  given  c>  E  >  0.)  g(x)  =  0(/(x))  if  |j(x)|  <  c|/(x)| 

for  |x|  >  c  where  c  >  0  is  a  constant. 

Hereafter  all  logarithms  are  to  the  base  2,  the  cases  of  real  and  complex  inputs  and  outputs 
are  studied  simultaneously  and  the  resulting  asymptotic  estimates  hold  in  both  cases. 
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2.  The  general  approach  and  preliminary  estimate*. 

Let  |xi|,  |y,|,  the  moduli  of  inputs  of  a  given  bilinear  computational  problem,  be  bounded  by 
M.  Let  the  outputs, 

n 

(2.1)  wk  =  £  x,wy hi,)6(s,  k)  where  6(s,  Jc)  is  0  or  1 , 

•—i 

are  to  be  evaluated  with  the  absolute  errors  at  most  E,  that  is 

(2.2)  |to£  —  iufc|  <  E  for  all  fc. 

(The  search  of  the  approximate  solutions  to  PW(n)  and  MM(n)  is  included  here.)  The  bound 
(2.2)  is  guaranteed  if  (cf.  (2.1)) 

(2-3)  wl~  ^2  *?(.)»«.)%  *) 

#— i 

and  if  z*^ ,  yj^  are  b-bit  binary  approximations  to  x*,  y,  such  that 

(2.4)  Vt:|x‘-xi|<c,  K|<N;  Vi:|y;-y,|<e,  |y*|<|y,|; 

(2.5)  E  —  2Mne. 

(To  simplify  our  notation,  we  assume  hereafter  that 

(2.6)  e  =  EflMn  =  2~ k,  h  is  an  integer.) 

Let  b  be  the  minimum  integer  under  the  above  conditions,  b  =  0(h),  and  let 

(2.7)  Vi:  ii  =  x'/e,  Vj:  y,  =  y3/e,  V*:  w'k  =  u»fcea . 

Then  (cf.  (2.3)-(2.7))  ii,  y},  wt  are  binary  integers  and 

(2.8)  Vi:  |ii|  <  M/e,  Vy:  |y,|  <  M/e,  Vfc:  \wk\  <  (M/e)2n. 

Let  A  be  an  algorithm  with  the  inputs  x%,  y}  that  evaluates  the  i>k  and  uses  only  arithmetic 
operations  over  (Gaussian)  integers.  Then  we  can  also  assume  that  the  operations  of  A  are 
performed  modulo  K .  If 

(2.9)  K  >  K0  =  (M/e)Jn  =  4  M*n*/E2 

we  still  obtain  the  same  values  u>k,  (cf.  (2.8)).  Then  it  remains  to  shift  the  radix  point  of  wk  2 h 
bits  left  in  order  to  obtain  the  desired  w*k,  (cf.  (2.6)-(2.7)). 

Remark.  The  main  advantage  of  such  an  approach  is  that  the  approximations  with  the 
required  precision  are  evaluated  involving  only  (Gaussian)  integers  modulo  K.  This  guarantees 
numerical  stability  of  the  algorithm  assuming  that  K  is  not  very  large. 

In  the  next  two  sections  we  apply  this  approach  to  obtain  upper  estimates  for  T(MM(n))  and 
T(PM(n))  using  the  following  auxiliary  upper  estimates: 

(2.10)  t«(p)  =  0(p) ,  tm(p)  =  0(p  log  p  log  log  p) , 

(cf.  (7-10)). 
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3.  Application*  to  matrix  and  polynomial  multiplication. 

Let  an  algorithm,  A,  for  MM(n)  use  0(n*)  arithmetic  operations.  We  can  always  assume 
that  no  divisions  are  involved,  (cf.  |8]  pp.  35-38).  If  all  constants  of  A  are  integers  we  can  choose 
any  K  that  satisfies  (2.9).  If  all  constants  of  A  are  rational  we  can  reduce  the  computation  to 
the  evaluation  of  the  axbk  with  only  integer  constants.  Here  the  integer  s  is  “the  accumulated 
denominator”  of  noninteger  constants  of  A.  We  can  assume  (see  [2]  section  6)  that 

(3.1)  log  s  =  0(log  n) . 

In  this  case  the  computations  modulo  K  give  the  values  Wt  if  K  satisfies  (2.9)  and  if  K  and  a 
are  relatively  prime.  (Then  we  add  n3  multiplications,  u>*  =  t(awk)  mod  K  where  ta  —  1  mod/C. 
Their  cost  is  negligible  because  always  q  >  2,  see  (7-10].)  We  can  assume  that  for  some  K,  such 
that 

(3.2)  K  =  O(/C0) » 

both  conditions  are  satisfied.  (Recall  (3.1)  to  see  that  a  can  not  be  too  large.) 

It  is  known  that  the  exponent  q  of  MM(n)  can  be  chosen  smaller  than  2.496.  (See  [6,11],  we 
cite  [6]  where  “only”  q  <  2.52  is  obtained  because  the  origin  of  the  basic  design  of  [7]  is  explained 
in  detail  in  the  earlier  paper  [6]  while  in  the  otherwise  successful  paper  [7]  the  confusing  name, 
“Schonhage’s  construction”,  does  not  help  the  reader.)  We  can  always  assume  that  the  constants  of 
an  algorithm  for  MM  are  algebraic  numbers  obtained  from  the  solution  of  a  system  of  polynomial 
equations. 

We  have  come  to  the  following  upper  estimate.  (See  (2.10),  (3.2).  For  the  transition  from 
algebraic  constants  to  rational  ones  see,  for  instance,  [6,  section  3].) 

Theorem  1.  If  there  exists  an  algorithm  for  MM(n)  that  uses  O(n')  arithmetic  operations  then 

(3.3)  r(MM(n))  =  0(n*tm(log  K0)) 

where  tm(p)  and  Ko  satisfy  (2.9),  (2.10),  q  <  2.496. 

In  order  to  make  the  described  above  approach  applicable  to  PM(n),  it  is  sufficient  (cf.  [8, 
pp.  86-87]  or  (9,  p.  440])  to  chooBe  a  prime  K  that  satisfies  (2.9)  and  such  that 

(3.4)  K  =  e » 2r  +  1 ,  2r  >2n. 

Let  K  be  such  a  prime  and  let 

(3.5)  K  —  0(/Con“) ,  (a  >  0  is  a  constant) . 

(For  the  justification  of  (3.5)  see  (8,  pp.  86-87]  or  (9,  p.  440].)  ■ 


Then  PM(n)  can  be  solved  by  an  evaluation-interpolation  algorithm  with  DFT(2r )  using 
0(r2')  additions,  0(2r)  multiplications  modulo  K,  see  [12].  This  gives  the  following  upper 
estimate,  (cf.  (2.10)). 

Theorem  2. 

j 

(3.6)  T(PM{n))  =  0(n  log  n  logf/Con01)  +  ”tm(log(/f0«a))) 

where  Kq  satisfies  (2.9),  a  >  0  is  a  constant. 

CoroUary.  (cf.  (2.10),  (3.3)-(3.6).) 

T(MM(n))  =  0(n*) ,  T(PM(n))  =  0(nlogs  n)  if  Kq  is  a  constant. 

T(MM(n))  =  0(n*  log  n  log  log  n  log  log  log  n),  T(PM(n))  =  0(nlog*n) 

if  K0  =  0(n%),  u  >  0  is  a  constant. 

T(M M(t»))  =  0(n,"*'w  log  n  log  log  n) ,  T(PM(n))  =  0(n1+w  log  n  log  log  n) 
if  log /Co  =  nv,  t»  >  0  is  a  constant. 

(3.3),  (3.6)  can  be  compared  with  the  estimates  for  the  bit-complexity  of  the  same  algorithms 
where  the  computations  are  in  the  fields  of  real  or  complex  numbers,  see  (1,2].  In  the  complex 
case  for  M  M(n)  and  PM(n)  and  in  the  real  case  for  MM(n)  we  have  just  obtained  practically 
the  same  asymptotic  upper  estimates  as  in  (1,2],  up  to  a  factor  n*  for  MM(n)  where  c  >  0  is 
arbitrarily  Bmall.  (See  however  our  Remark  above  in  Section  2.)  The  informational  lower  bound 
of  Theorem  5.1  from  (1)  can  be  applied  to  the  case  of  the  evaluation  modulo  K  also.  Comparing 
with  the  upper  bounds  of  Theorem  5.1  from  (1]  and  of  Theorem  2  above  we  notice  that  the  gap 
between  the  lower  and  upper  bounds  on  T(PM(n))  is  log2  n  rather  than  logn  even  if  M2n/E  is 
a  constant.  (Is  the  upper  estimate  sharp  up  to  a  constant  factor?)  In  the  real  case  the  upper 
estimate  (3.6)  and  the  lower  estimate  of  Theorem  6.1  from  [1]  show  that  the  transition  to  the 
computations  modulo  K  enables  us  to  reduce  T(PM(n))  roughly  by  the  factor  n. 

Conclusion. 

It  is  interesting  to  apply  the  above  approach  to  the  bilinear  computational  problems  studied  in 
[12].  Those  are  convolution  of  vectors  and  some  problems  that  are  encountered  in  signal  processing. 
Then  the  resulting  algorithms  are  faster  than  ones  from  [12],  better  structured  and  have  guaranteed 
numerical  stability.  The  only  price  for  that  is  the  use  of  modular  arithmetic.  In  which  cases  is 
such  a  price  too  high? 
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